Matrix Multiplication Implementation in the MOLEN Polymorphic Processor
نویسندگان
چکیده
Floating-point matrix multiplication is arguably the most important kernel routine in many scientific applications. Therefore, its efficient implementation is crucial for the overall performance of any computer system targeting scientific computations. In this paper, we propose a holistic solution to accelerate matrix multiplication on reconfigurable hardware using the MOLEN polymorphic processor. The MOLEN polymorphic processor consists of a general purpose processor (GPP) tightly coupled with a reconfigurable coprocessor. The latter can be used to implement arbitrary functions in hardware using custom computing units (CCUs). We implemented matrix multiplication as a CCU for the MOLEN processor and realized it on real reconfigurable hardware. The software interface is defined by the MOLEN programming paradigm, which enables trivial integration of the hardware accelerator at the application level. A matrix multiplication is initiated on the CCU by a MOLEN execute instruction and the required operation parameters are transferred through exchange registers. For our experiments, we employ Xilinx Virtex-II Pro technology. An XC2VP30 device proved to be large enough to contain the MOLEN processor infrastructure and a CCU consisting of 9 processing elements, running at 100 MHz. A benchmark application on this system closely approaches the theoretically maximum attainable performance of 1.8 GFLOPS/s. Furthermore, we analyzed the performance with different design parameters and problem sizes. The proposal is clearly scalable and due to its polymorphic nature, it allows optimal configurations in different application contexts and for various chip sizes. Keywords—Field Programmable Gate Arrays, Floating point arithmetic, Matrix multiplication, Reconfigurable architectures.
منابع مشابه
Polymorphic AES Encryption Implementation
This paper presents a hybrid hardware-software implementation of the AES encryption algorithm on the MOLEN polymorphic processor [1]. In order to combine the advantages of both the software and the hardware implementations, the application code has been divided into two computational approaches, software and hardware. Only the main ciphering function, the more computational demanding component,...
متن کاملA Hardware Implementation of a Binary Neural Image Processor
This paper presents the work that has resulted in the SAT processor; a dedicated hardware implementation of a binary neural image processor. The SAT processor is aimed speciically at supporting the ADAM algorithm and is currently being integrated into a new version of the C-NNAP parallel image processor. The SAT processor performs binary matrix multiplications, a task that is computation-ally c...
متن کاملDesign and Implementation of Field Programmable Gate Array Based Baseband Processor for Passive Radio Frequency Identification Tag (TECHNICAL NOTE)
In this paper, an Ultra High Frequency (UHF) base band processor for a passive tag is presented. It proposes a Radio Frequency Identification (RFID) tag digital base band architecture which is compatible with the EPC C C2/ISO18000-6B protocol. Several design approaches such as clock gating technique, clock strobe design and clock management are used. In order to reduce the area Decimal Matrix C...
متن کاملThe Virtex II ProTM MOLEN Processor
We use the Xilinx Virtex II ProTM technology as prototyping platform to design a MOLEN polymorphic processor, a custom computing machine based on the co-processor architectural paradigm. The PowerPC embedded in the FPGA is operating as a general purpose (core) processor and the reconfigurable fabric is used as a reconfigurable co-processor. The paper focuses on hardware synthesis results and ex...
متن کامل